Due 2/2 by the start of class.

Working in Teams: Stepping Stones progress

For the coming week, teams should start the following:

  1. For the the two already collected measures they’ve been assigned, start drafting the purpose – how is this measure useful – and potential policy contexts or changes that may be relevant to understanding the measures. These are just draft ideas to get us thinking; they don’t need to be fully refined or fully researched/informed – we’re just thinking about what else we might need to know to appropriately interpret and contextualize the measures.
  2. Write an initial R script to download the first measure in their set and review city excel file for background and issues,
  3. Begin work on a request for any measure not available online (e.g., identify who to contact, draft an email). We won’t reach out to reques data yet, as some teams may need to coordinate and we want to run our plans by our city partners.

This work should be submited to the stepping stones channel in slack so we can better learn from one another.

Working in R

Using the police stops data from Charlottesville we shared in the first class (also downloadable from github), write a script that does the following:

  1. Loads tidyverse and janitor and reads in the police stops data (use the argument for column types we used in class) and cleans the variable names.
  2. Examine the data again (e.g., things like names and variable structures and summaries). Based on your examination, what’s the most common action taken in police stops in Charlottesville?
  3. Generate a new variable for age – call it age_recode – and set it equal to missing (NA) if age is 0 and equal to the value of age otherwise (hint: be sure to sae this variable to the data frame; that is, assign the data frame to itself before piping into dplyr functions). How frequently is the age variable missing (0)?
  4. Practice using the dplyr functions filter, select, arrange, and count. Generate code that answers the following questions using these functions (hint: you won’t need all of them for an answer, but use only some combination of them to generate the answer. This time, printing the answer to the console is enough; you don’t need to save the result into a named object).
    1. Looking only at Terry stops (reason_for_stop) made by the Charlottesville police (agency_name), what is the distribution of race among individuals stopped?
    2. Looking only at Terry stops (reason_for_stop) made by the Charlottesville police (agency_name), show the race and action taken for all observations (hint: you can add print(n = X) to a set of piped functions to show X values in the console).
  5. Practice using dplyr functions mutate, group_by, summarize and filter. Generate code that answers the following questions using a combination from among theese functions.
    1. Among stops where age is present (hint: where !is.na(age_code) or age != 0), how many stops are there and what is the minimum, mean, and maximum age of those stopped by reason for the stop?
    2. Among Terry stops, generate the proportion of actions taken; that is, what proportion of Terry stops result in an arrest, a warning, a citation, or no enforcement?

Save the script into the scripts folder of your learningR folder. When complete, submit this file to me via direct message on slack (give it a name like week2_mpc.R as I’ll be adding everyone’s to the same script folder in my own version of this folder!)

Artwork by @allison_horst

Artwork by @allison_horst